Testing Dimension Reduction Methods for Text Retrieval

نویسنده

  • Pavel Moravec
چکیده

In this paper, we compare performance of several dimension reduction techniques, namely LSI, random projections and FastMap. The qualitative comparison is based on rank lists and evaluated on a subset of TREC 5 collection and corresponding TREC 8 ad-hoc queries. Moreover, projection times and intrinsic dimensionality were measured to present a common baseline for methods’ usability.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dimension Reduction Methods of Text Documents by Neural Networks

The paper is oriented to introduce different dimension reduction methods in the text document retrieval area. First, the mostly used text document retrieval models are described, and then in second part the analytical approach and neural network approaches to dimension reduction of keyword space are described. Dimension reduction methods reduce keyword space into much smaller size together with...

متن کامل

Analysis of unsupervised dimensionality reduction techniques

Domains such as text, images etc contain large amounts of redundancies and ambiguities among the attributes which result in considerable noise effects (i.e. the data is high dimension). Retrieving the data from high dimensional datasets is a big challenge. Dimensionality reduction techniques have been a successful avenue for automatically extracting the latent concepts by removing the noise and...

متن کامل

Dimension reduction based on centroids and least squares for efficient processing of text data

Dimension reduction in today’s vector space based information retrieval system is essential for improving computational efficiency in handling massive data. In our previous work we proposed a mathematical framework for lower dimensional representations of text data in vector space based information retrieval, and a couple of dimension reduction method using minimization and matrix rank reductio...

متن کامل

Lower Dimensional Representation of Text Data in Vector

Dimension reduction in today's vector space based information retrieval system is essential for improving computational eeciency in handling massive data. In this paper, we propose a mathematical framework for lower dimensional representation of text data in vector space based information retrieval using minimization and matrix rank reduction formula. We illustrate how the commonly used Latent ...

متن کامل

Text Document Retrieval by Document Space Dimension Reduction with Feed-Forward Neural Networks

The paper deals with text document retrieval from the given document collection by using neural networks, namely cascade neural network model, linear and nonlinear Hebbian neural networks and linear autoassociative neural network. With using neural networks it is possible to reduce the dimension of the search space with preserving the highest retrieval accuracy.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005